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Abstract 

We present the first determination of parton distributions of the nucleon at NLO 
and NNLO based on a global data set which includes LHC data: NNPDF2.3. Our data 
set includes, besides the deep inelastic, Drell-Yan, gauge boson production and jet data 
already used in previous global PDF determinations, all the relevant LHC data for which 
experimental systematic uncertainties are currently available: ATLAS and LHCb W and Z 
rapidity distributions from the 2010 run, CMS W electron asymmetry data from the 2011 
run, and ATLAS inclusive jet cross-sections from the 2010 run. We introduce an improved 
implementation of the FastKernel method which allows us to fit to this extended data set, 
and also to adopt a more effective minimization methodology. We present the NNPDF2.3 
PDF sets, and compare them to the NNPDF2.1 sets to assess the impact of the LHC 
data. We find that all the LHC data are broadly consistent with each other and with 
all the older data sets included in the fit. We present predictions for various standard 
candle cross-sections, and compare them to those obtained previously using NNPDF2.1, 
and specifically discuss the impact of ATLAS electroweak data on the determination of 
the strangeness fraction of the proton. We also present collider PDF sets, constructed 
using only data from HERA, Tevatron and LHC, but find that this data set is neither 
precise nor complete enough for a competitive PDF determination. 
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1 Introduction 



The most accurate available information on the parton distribution functions (PDFs) 
of the nucleon, an essential ingredient for hadron collider phenomenology [IHS], comes 
from global fits to extended sets of data obtained from a variety of different electro- and 
hadroproduction processes, in particular, deep inelastic scattering (DIS), Drell-Yan (DY) 
and gauge boson production, and jet production. The combination of the information 
from all these processes allows one to determine six independent light quark and anti- 
quark distributions and the gluon. Data from the LHC are likely to offer very significant 
improvements in the accuracy of these determinations, because of their greater precision 
and kinematic coverage, and also because of the greater range of precise cross-sections 
available, including processes such as gauge boson production in association with jets and 
charm which hitherto have not been used for PDF determinations. 

In fact, data for processes relevant for PDF determination collected at the LHC dur- 
ing the first run in 2010 already reached an accuracy comparable to pre-existing data. 
In Ref. [HE] we presented a first study of the impact on PDFs of LHC data, specifically 
the 36 pb~^ W-lepton asymmetry data from ATLAS and CMS, then available without 
full information on correlated systematics. We constructed a PDF set, NNPDF2.2, by 
reweighting [Bj the NNPDF2.1 NLO and NNLO PDF sets [7l[8]. Even with the modest 
amount of new experimental information then available, small improvements in the ac- 
curacy and changes in the shape of light-quark distributions in the medium and small x 
region were found. 

Meanwhile, several LHC data sets with full information on correlated systematics have 
been published, in particular gauge boson production data from ATLAS, CMS and LHCb, 
and inclusive jet and dijet data from ATLAS. Preliminary studies [S] with some of these 
data have shown that, thanks to the information on correlated systematics, their impact 
on PDFs is significant: if included by reweighting the NNPDF2.1 set, one has to start 
with a large number of replicas in order to obtain accurate results. A new set of PDFs 
including this considerable amount of new information is thus needed, with the new data 
included in the fit, rather than added at a second stage via reweighting. 

It is the purpose of this paper to present such a PDF determination, using the method- 
ology developed by the NNPDF collaboration pMTT] . and used to produce the NNPDF2.1 
LO, NLO, and NNLO PDF sets [TIH] which are now part of the PDF4LHC prescription [H] 
for reliable determination of PDF uncertainties in LHC processes. The new PDF set pre- 
sented here, NNPDF2.3, is the most accurate determination to date from the NNPDF 
family, and it supersedes previous existing sets. It differs from the NNPDF2.1 set be- 
cause of the inclusion of LHC data, and also because of some improvements in fitting 
methodology, specifically in the genetic algorithm which is used to determine the fit to 
the replicas. 

We win determine NNPDF2.3 PDFs both at NLO and NNLO for a wide range of values 
of Os, SO that the user can select their own preferred value. While the default NNPDF sets 
use a variable-fiavor number general-mass scheme based on the FONLL [19H21j method 
for the inclusion of heavy quark masses, in which the number of active fiavor increases at 
each quark threshold, we also provide PDF sets in which the maximum number of active 
flavors is flxed at n/ = 4 or = 5, which may be useful for speciflc applications (see 
for example Refs. [^[23]). We wih also provide NNPDF2.3 NLO and NNLO sets based 
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on reduced data sets: an NNPDF2.3 noLHC set, which uses exactly the same data set 
as NNPDF2.1, and differs from it only because of the improved methodology; and an 
NNPDF2.3 collider set, which only uses HERA inclusive and charm DIS data, Tevatron 
gauge boson production and jet data, and LHC data, and excludes all fixed target data. 
The former set is useful for the precise assessment of the impact of LHC data, and also if 
one for some reason wishes to avoid using LHC data, for example for unbiased new physics 
searches at LHC. The latter set is interesting because the fixed-target data, both for DIS 
and DY, are problematic: they are generally at lower scales (hence many of them are 
potentially subject to indeterminate higher- twist corrections), some carry no information 
on correlated systematics, and finally many of them (such as NMC and BCDMS and all 
neutrino DIS data, and Tevatron DY data) are partly or fully obtained using nuclear 
targets which may be subject to significant but unknown nuclear corrections. Jet data, 
which are affected by larger theoretical uncertainties, play at present a relatively minor 
role, so a collider-only fit is at present definitely theoretically rather more reliable than 
the global fit: unfortunately we find that it is not yet competitive in terms of statistical 
precision. 

The paper is organized as follows. In Sec. [2] we will summarize the general features 
of the LHC data which are being added to the data set, and specifically the choices of 
kinematic cuts. In Sec. [3] we will discuss the methodological improvements which are be- 
ing introduced in the NNPDF2.3 determination. In order to cope with the non-negligible 
widening of the data set — the number of data for hadron collider processes, which are 
computationally the most intensive since they depend quadratically on the PDFs, is con- 
siderably increased in comparison to NNPDF2.1 — we introduce here (Sec. l3.lVl3.2p a new, 
more efficient implementation of our previous FastKernel method \17\ for the computa- 
tion of hadronic observables. The ensuing considerable improvement in computational 
efficiency allows us to switch to a new choice of settings for the genetic algorithm which is 
used for minimization, which is computationally rather more intensive but leads to more 
precise results (Sec. 13. 3p . In Sec. H] we discuss the NNPDF2.3 PDF set: after summa- 
rizing the statistical features of the fit, we present the PDFs, and compare them to our 
previous NNPDF2.1 set, specifically by separating the effect of the LHC data and that 
of the improved methodology, through a comparison which also involves the NNPDF2.3 
noLHC set. The analysis of the impact of LHC data is expanded upon in Sec. [5j First, 
we determine quantitatively through the reweighting technique the amount of information 
introduced in the fit by LHC data and their degree of consistency with the rest of the data 
set, and examine how the description of the LHC data improves when they are included in 
the fit. Then, we study in detail the compatibility of collider data (including those from 
the LHC) with fixed target data. Finally, we specifically address the issue of the amount 
of strangeness in the nucleon, which has attracted some attention recently [24], and in 
particular show that even though ATLAS data seem to favour a somewhat larger strange 
fraction, once uncertainties are properly accounted for there is no incompatibility between 
ATLAS and fixed target data. Finally, in Sec. [6l after briefly discussing NNPDF2.3 par- 
ton luminosities, we compare several standard candle cross-sections obtained with them 
to those obtained with NNPDF2.1. 
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2 Experimental data 



In this section we describe the data set used in the NNPDF2.3 analysis. The non-LHC 
data is the same as in the corresponding NNPDF2.1 NLO and NNLO fits: this includes 
NMC BCDMS [2I1[2S] and SLAC [S] deep inelastic scattering (DIS) fixed target 

data; the combined HERA-I DIS data set [30] , HERA Fl [S] and structure function 
data [32H38], ZEUS HERA-II DIS cross-sections [391110], CHORUS inclusive neutrino 
DIS [H], and NuTeV dimuon production data pEH] : fixed-target E605 [S] and E866 05]- 
iZ] Drell-Yan production data; CDF W asymmetry ^ and CDF [19] and DO [50] Z 
rapidity distributions; and CDF [51j and DO [52] Run-II one-jet inclusive cross-sections. 
The kinematical cuts are unchanged from NNPDF2.1, so we will not review these data 
sets here. Instead we will focus on the features of the LHC data included in NNPDF2.3. 
First we discuss electroweak gauge boson production and then inclusive jets. In each case 
we describe the NLO and NNLO codes and the corresponding settings used to compute 
the theoretical predictions for each of these data sets, while in the next section we describe 
the practical implementation of these calculations into the FastKernel method used in the 
NNPDF fitting code. 

The LHC has already provided an impressive set of measurements which are sensitive 
to parton distributions, mostly from the 2010 run based on a total integrated luminosity of 
36 pb~^: the inclusive jet and dijet data [53l - f55] . electroweak vector boson production [56l - 
l6T] . both inclusive and in association with heavy quarks [62], and direct photon production, 
both inclusive and associated with jets j63H65j. Several other measurements from the 2011 
and 2012 runs, which are very relevant for PDF fits, will be available in the next months, 
like the CMS and LHCb low mass Drell-Yan differential distributions |661I67) and the 
inclusive jets and dijets from ATLAS and CMS [55] . 

Precise determination of PDFs, adequate to current needs, requires the use of exper- 
imental data which come with a full covariance matrix. In NNPDF2.3 we include all 
currently available LHC data for which the experimental covariance matrix has been pro- 
vided: the ATLAS W and Z lepton rapidity distributions from the 2010 data set, the CMS 
W electron asymmetry from the 2011 data set and the LHCb W lepton rapidity distri- 
butions from the 2010 data set [61], together with the ATLAS inclusive jet cross-sections 
from the 2010 run [55] . 

Tevatron Run II lepton asymmetry data from W production were included in Ref. [6] 
by reweighting the NNPDF2.0 PDF set. Some of these data had issues of compatibility 
with the rest of the NNPDF2.0-NNPDF2.1 dataset (these two PDF sets only differ in the 
treatment of heavy quark mass terms), and they only had a moderate impact on PDF 
uncertainty and essentially no effect on the PDF shape. We prefer therefore not to include 
these data in the NNPDF2.3 set, and concentrate on the impact of LHC data. 

Also, for the time being we do not increase the set of physical processes which are being 
used for PDF determination. In particular, we choose to use the inclusive jet rather than 
the dijet cross-sections from ATLAS: to use both would be double counting because they 
share the same underlying raw data. In principle, dijets cross-sections carry more detailed 
information on the underlying parton kinematics; however, they are subject to significant 
scale uncertainties (see for example Ref. [69] and references therein). In the future, as 
more data with full systematics becomes available, it is likely that the inclusion of new 
processes in PDF determination will be advantageous. This will require the development 
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Figure 1: The kinematical coverage of the experimental data used in the NNPDF2.3 PDF 
determination. 



of suitable fast interfaces for these processes. An example is prompt photon production, 
whose impact on the gluon determination was studied recently in Ref. [70j. 

The kinematical coverage of the LHC data sets included in the NNPDF2.3 analysis with 
the corresponding average experimental uncertainties for each data set are summarized in 
Tab. [IE. A scatter plot of the kinematical plane for all experimental data from NNPDF2.3 
is shown in Fig. [TJ The LHC electroweak data span a larger range in Bjorken-x than the 
Tevatron data thanks to the extended rapidity coverage (up to i] = 4.5), while the inclusive 
jets span a much wider kinematical range both in x and than the one accessible at the 
Tevatron. 

In Tab. [2] we also give the total number of data points used for PDF fitting, both for 
the NLO and the NNLO global sets, and for the various other PDF sets discussed in SecHl 
Note that the NLO and NNLO noLHC data sets differ from those of the NNPDF2.1 NLO 
and NNLO fits of Refs. [7l[8] because of the inclusion in the NNPDF2.3 data set of three 
NMC data points which were inadvertently neglected in the NNPDF2.1 fits. 

^For the one-jet inclusive cross-sections we consider here, the parton kinematics is not fixed even at 
leading-order. Therefore, we plot only the minimum value of x of the parton with smallest x, which is 
given hyx= ^e-l"! in terms of the transverse momentum pr and rapidity rj of the jet and the center-of 
mass energy y^s of the hadronic collision 
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Data Set 


Ref. 


iVdat 




{<Tstat) (%) 


W.y.) (%) 


(cTnorm) (%) 


CMS W electron asy 840 pb"' 




11 


[0,2.4] 


2.1 


4.7 





ATLAS W+ 36 pb^' 




11 


[0,2.4] 


1.4 


1.3 


3.4 


ATLAS W" 36 pb^^ 


M 


11 


[0,2.4] 


1.6 


1.4 


3.4 


ATLAS Z 36 pb^^ 


[55] 


8 


[0,3.2] 


2.8 


2.4 


3.4 


LHCb W+ 36 pb^' 


EH 


5 


[2,4.5] 


4.7 


11.1 


3.4 


LHCb W~ 36 pb"^ 


m 


5 


[2,4.5] 


3.4 


7.8 


3.4 


ATLAS Inclusive Jets 36 pb"^ 


m 


90 


[0,4.5] 


10.2 


23.4 


3.4 



Table 1: The number of data points, kinematical coverage and average statistical, systematic and 
normalization percentage uncertainties for each of the experimental LHC data sets considered for 
the NNPDF2.3 analysis. ATLAS inclusive jets refers to the R = 0.4 data set. There are 146 LHC 
data points altogether. 



Fit 


NLO NNLO 


NNPDF2.3 noLHC 
NNPDF2.3 Collider only 
NNPDF2.3 


3341 3360 
1212 1231 
3482 3501 



Table 2: Total number of data points for the various global sets used for PDF fitting. 



2.1 Electroweak boson production 

ATLAS has measured the W lepton and Z rapidity distributions from 36 pb~^, and pro- 
vides the full experimental covariance matrix [56]. This measurement supersedes the 
original muon asymmetry measurement from W decays [57j , for which the covariance ma- 
trix was not available, and also adds the Z rapidity distributions, which are closely tied 
to the W lepton distributions by the cross-correlated systematic uncertainties. The GIVES 
collaboration has presented a measurement of the electron asymmetry with 840pb~^ ^B] 
which supersedes the 36 pb~^ data ^59j and also provides the experimental covariance 
matrix. GIVES has presented a measurement of the normalized Z rapidity distribution with 
36 pb~^ 160J, but the covariance matrix is not available. Finally, the LHCb Gollabora- 
tion has presented results for the W and Z lepton rapidity distribution, from the 2010 
data set [61], again with the experimental covariance matrix; however the Z data are not 
included in our determination because they are in the process of being reanalyzed [71] H 
The theoretical predictions for LHC electroweak boson production have been com- 
puted at NLO with MCFM [72l|73] interfaced with the APPLgrid library for fast NLO cal- 
culations |174j. As discussed in Ref. [8j (see in particular Sec. 3.2) NNLO predictions 
are obtained by means of local i^-factors. These have been computed using the DYNNLO 
code |75j . and are found to be quite small, of order 2% at most, and slowly varying with 
the lepton rapidity. 

The calculation of NLO cross-sections requires the implementation of cuts on the lepton 
kinematics. For the ATLAS data, these are the following: 

• cuts for the W lepton rapidity distributions 

p^T > 20 GeV, p!^ > 25 GeV, mr > 40 GeV, |t//| < 2.5; 

^Note also that in Ref. [61] there is a typo in the data point for the 3.5 — 4 rapidity bin, which 
reads 125 ± 5tf while it ought to be 125 ± 5tf [71] • 
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• cuts for the Z rapidity distribution 

> 20 GeV, 66 GeV < < 116 GeV, r/;+ < 4.9. 

ATLAS measures separately the rapidity distributions in both the electron and muon 
channels, and then combines them into a common data set. The above kinematical cuts 
correspond to the combination of electrons and muons, but differ from the cuts applied in 
individual leptonic channels. For Z rapidity distributions we have explicitly verified that 
results are unchanged if the cut on the rapidity of the leptons from the Z decay is removed. 

For the CMS W electron asymmetry, the only cut is 

p^T > 35 GeV, 

with the same binning in electron rapidity as in Ref. |58j . 
Finally, for the LHCb W data the kinematical cuts are 

> 20 GeV, 2.0 < < < 4.5. 

For all three data sets, we performed extensive cross-checks at NLO using two different 
codes, DYNNLO and MCFM: we checked that, once common settings are adopted, the results 
of the MCFM and DYNNLO runs agree to better than 1% for all the data bins. An even 
more accurate agreement could be reached in principle [76], but it is computationally very 
costly and unnecessary for our purposes. In the particular case of the ATLAS W and Z 
distributions, we also found good agreement with the APPLgrid tables used in the recent 
HERAf itter analysis of ATLAS data [24J. 

2.2 Inclusive jet production 

The Tevatron jet data play an important role in constraining the gluon distribution. The 
kinematics coverage of this constraint is extended considerably by the LHC jet data. 
From the 2010 36pb^^ data set inclusive jet and dijet production have been measured by 
CMS |53y54j and ATLAS [55j, however only ATLAS give the full experimental covariance 
matrix. The covariance matrix is particularly important for these data because they are 
highly correlated. 

The theoretical calculation of jet production cross-sections in hadron collisions can 
be carried out by exclusive parton level Monte Carlo codes such as NLOjet++ [77] and 
EKS-MEKS |78ll79j . More recently, the NLO calculation matched to parton showers in the 
context of the POWHEG framework has also become available [80], allowing direct hadron 
level comparisons between theory and data. On the other hand, the full NNLO correc- 
tions to the inclusive jet production are unknown, and only the threshold corrections to 
the inclusive jet px distribution are available [81], thus the inclusion of jet data into an 
NNLO analysis is necessarily approximate. Hadron collider jet production data can be 
consistently included at NLO within a global PDF analysis framework using fast NLO 
grid codes such as FastNLO [82l[83] or APPLgrid [7i] . 

We compute inclusive jet cross-sections using NLO j et++ interfaced to APPLgrid. The 
jet reconstruction parameters are identical to those used in the experimental analysis |84] . 
The NLO calculation uses the anti-Zj^ algorithm [85], and the factorization and renormal- 
ization scales are set to be p^^^, the transverse momentum of the hardest jet in each event. 
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We choose to include in the analysis the data with R = 0.4. These data are less sensitive 
to nonperturbative corrections from the underlying event and pileup as compared to the 
R = 0.6 data [86]^], and though they are a bit more sensitive to hadronization effects, 
all in all the nonperturbative parton to hadron correction factors are smaller for R = OA 
than for R = 0.6. We have checked that the results are essentially unchanged, both in 
terms of impact on PDFs and at the level of the description if the R = 0.6 data is used 
instead of the R = 0.4 data. 

On top of the 86 sources of fully correlated systematic errors, the ATLAS jet spec- 
tra have an additional source of uncertainty due to the theoretical uncertainty in the 
computation of the hadron to parton nonperturbative correction factors. We take these 
nonperturbative corrections and their associated uncertainties from the ATLAS analy- 
sis, where they are obtained from the variations of different leading order Monte Carlo 
programs. It is clear from Ref. [Slj that for a given Monte Carlo model the nonperturba- 
tive correction is strongly correlated between data bins, and thus conservatively we treat 
it as an additional source of fully correlated systematic uncertainty, to be added to the 
covariance matrix. 

Because NNLO corrections to jet cross-sections are not available, hadron collider jet 
data can only be included in a NNLO fit within some approximations. As in NNPDF2.1, 
NNLO theoretical predictions for CDF and DO inclusive jet data are obtained using the 
approximate NNLO matrix element obtained from threshold resummation [81] as imple- 
mented in the FastNLO framework |82y83|. For ATLAS data the threshold approximation 
is expected to be worse because of the higher centre-of-mass energy, and thus we sim- 
ply used the NLO matrix element with NNLO PDFs and Og. It was checked in Ref. [8] 
that, for Tevatron data, the difference between fits with approximate NNLO jet matrix 
elements, and fits with purely NLO matrix elements is significantly smaller than the dif- 
ference between fits with and without jet data. 
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3 Methodological improvements 



We discuss some methodological improvements introduced in the current NNPDF release. 
First, we present a new efficient method which has allowed us to speed up considerably the 
computation of hadronic observables while maintaining full NLO accuracy for all experi- 
mental data. As explained below, it is a refinement of the FastKernel method introduced 
in Ref. jl7j . The rest of the section describes some improvements of the minimization al- 
gorithm, which allow for a more extensive exploration of the space of parameters, and thus 
a more accurate minimization. These new settings are computationally more intensive, 
and are made possible by the new implementation of the FastKernel method. 

3.1 The FastKernel method revisited: deep inelastic scattering 

We first briefly review the FastKernel method in the simplest case of deep inelastic scat- 
tering, closely following the original description of the algorithm in Ref. [T7]. In the 
FastKernel method the PDFs at the initial scale Qq are transformed into the evolution 
basis jl3 j, in which all nonsinglet combinations decouple, and only the singlet and gluon 
evolve, coupled to each other. The basis of parametrized PDFs is trivially related to the 
evolution basis through a linear transformation. The PDFs in the evolution basis at the 
initial scale are denoted by N^{x). The index i ranges from 1 to Np^i = 13, though only 
the light PDFs are independently parametrized, heavy quarks being generated dynami- 
cally during the perturbative evolution. Observables are denoted by aj, where / is an 
index that runs over the number of data points included in the fit. Each data point aj is 
characterized by a set of kinematic variables. For a DIS observable the kinematic variables 
characterizing the data points are (x/, Q'j), and the observable itself can be written as 

cTi=^K!<g)Nf, (1) 

i=l 

where K- is a kernel obtained by convoluting the coefficient functions Cj for aj with the 
DGLAP evolution kernels Tji, so 

Kl{xj,asiQ^),asiQl)) = ^ rj,(x/, a,(Q2), ^^(Q^)) _ (3) 

The idea behind the FastKernel method is to approximate the PDFs at the initial scale 
by a linear combination of interpolating functions 

iV°(x) = 5^<,xW(x). (3) 
a=l 

The coefficients of the linear combination are given by the value of the function Nj{x) 
computed on a grid of points, N^^ = N^{xa)- The index a runs from 1 to Nx, the number 
of points in the x-grid. The same grid of x values is used for all data points /. More details 
on the choice of the grid, and of the interpolating functions can be found in Ref. [T7j, where 
the choices that guarantee an accurate interpolation at a sensible computational cost are 
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described. The idea of expanding PDFs in terms of a basis of interpolating polynomials in 
order to perform PDF evolution more efficiently is the same as that used in the evolution 
programmes HOPPET [88] and QCDNUM [89]. Using the interpolated PDFs, and writing 
explicitly the convolution in Eq. dH) yields 

--EE(/;f'^.'(^)-'">w)<.- w 

i=l a=i -f^i ^ \ y / 

The key observation now is that the integral is independent of the value of the N^-, 
and can thus be precomputed. Denoting the integral in Eq. by S^j, we thus obtain 

a, = j;si,<, = S,.iVO. (5) 

a,i 

This expression is a simple scalar product, with the dot indicating that the suppressed 
indices (a, i) have been contracted. The coefficients N^- are stored as an array of real 
numbers. In an actual fit, this array is updated every time the PDFs at the initial scale 
are changed. For each data point /, the coefficients do not change when the PDFs are 
updated, so they can be precomputed offline and stored. Note that, for any given choice 
of / and a, the integral in Eq. Q vanishes if = in the interval [xj, 1]. As a 

consequence, the array contains many zeroes. The computation of the observables is 
optimized by including only the non- vanishing terms in the scalar product in Eq. ^ , and 
can thus be evaluated very rapidly. 

It is worth noting that, within this framework, all the theoretical inputs are encoded 
in the arrays S^^, called the FK tables in the following discussion. Any variation of the 
parameters (e.g. as, CKM matrix elements, EW parameters, mass thresholds), renormal- 
ization scheme, or renormalization or factorization scales, is implemented by generating 
a new set of E^^. Each data set then has several FK tables associated with it, each table 
corresponding to one particular choice of theoretical inputs. 

3.2 The FK method for hadronic collider data 

As described in Ref. [17,\, the FastKernel method can also be applied to hadronic data, and 
indeed from NNPDF2.0 onwards all the fixed target Drell-Yan data and the Tevatron W 
and Z data were included at full NLO accuracy using this technique. A similar approach 
to the FastKernel method has been applied by various other groups to the NLO calcula- 
tion of hadron collider observables. For example the FastNLO [82''83J and APPLgrid [74] 
collaborations provide software tools which are capable of performing efficient NLO QCD 
calculations for a variety of hard scattering processes, like jet production and electroweak 
boson production. 

In all these frameworks, Monte Carlo weights from an appropriate event generator, 
such as NLOjet++ [77] and MCFM [QOj are stored, partonic subprocess by subprocess, on an 
interpolating grid in x and Q-space. With this grid the calculation of the observable is 
reduced to simple products and sums; the parton distributions may then be varied without 
incurring a large computational overhead. 
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As an illustration, we review the procedure implemented in the APPLgrid 
The calculation of a collider observable is performed as 



-^=EEE<S^^^M-«>-/3,Q?), (6) 

1=0 a,p T 

where the indices a, /3 run over points in the x-space grid, r runs over points in , and I 
denotes the specific parton level subprocess. The W table contains the values of the Monte 
Carlo weights for a particular subprocess, and the F^'^ are the incoming subprocess parton 
luminosities constructed as a bilinear combination of PDFs appropriate for the subprocess 
in consideration. 

Methods such as the one described above, and exemplified by Eq. Q, provide already 
fast and efficient NLO QCD calculations of hadronic observables, and were adopted by 
NNPDF starting with the NNPDF2.0 global fit [17J . However, it is stih possible to 
reduce considerably the number of floating-point operations required in a fit by combining 
this procedure with the evolution of the PDFs. Such a combined approach for hadronic 
processes, which we call the FK approach, is extremely fast: it allows the precomputation 
of all the dependence in the calculation, and reduces the final computational task to 
scalar products similar to that described above for DIS. 

The first step of the FK method for hadronic data is to construct a table of evolution 
coefficients by taking the convolution of the DGLAP evolution kernels with a suitable 
interpolation basis. The procedure is identical to the case of DIS FastKernel tables sum- 
marized above, albeit without the additional convolution with coefficient functions. Using 
the same interpolating functions, we introduce first an evolution matrix 

Elp,, = f ^r,, (^,a.(Q^),a.(Q^)) X(«(y) . (7) 

J Xa y \ y / 

Evolution from the initial scale in the evolution basis is then once again reduced to a scalar 
product: 

Ar,(x„, Ql) = = El, ■ , (8) 

where the dot again denotes an implicit sum, now over j). Introducing a suitable matrix 
R to rotate to the flavor basis, more suitable for constructing the parton luminosities 
required in Eq. ([6]), we can write a PDF /„, n = 1, . . . A'^pdf in the flavor basis at the scale 
in the form 

^Pdf 
13 j 

^ The corresponding formalism for the FastKernel and FastNLO methods are very similar, with some 
technical differences. 
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where 



pdf 



■KxPnj — ^ ^niE^pij (10) 



is now the rotated evolution matrix, and again the dot denotes an imphcit sum over (/3, j). 

Having factorized the DGLAP evolution of the incoming parton distributions, it is now 
simple to construct the required subprocess luminosities for the observable. Firstly, 



pdf 



F« {x^,Xp, Ql) = Dliirr fn^i^a, Ql)fn{xp, QI) , (11) 

m,n 

where the Dmn are coefficients giving the nonzero contributions of the flavor combination 



fmfn to the subprocess in question. Substituting Eq. (|9]) into Eq. (fTT|) then gives 

F« {x^,xp, Ql) = ^ Diii^ {Al^ . N') {A}^ ■ N^) , (12) 

m,n 

and thus upon substitution into Eq. ([6]), 

= E E E E • N') {AL ■ N') ■ (13) 

/=0 7,(5 T m,n 

The PDF evolution, now made explicit, may be absorbed into the Monte Carlo weight 
grid allowing for a great deal more of the calculation of Eq. ^ to be precomputed: if we 
define a FK grid 

A^sub ^Pdf 

^il3ij = E E E E ^iSr ^mnA'^amiAsi3nj ' (14) 
/=0 T m,n 7,(5 

the hadronic observable can be evaluated as 

^Pdf 

= E E = • • iV" . (15) 

This compact expression shows that the computation of hadronic observables is now re- 
duced to a sum of bilinear products over a grid in x-space, and the basis of input PDFs, 
in complete analogy with the DIS case presented above. 

The coefficients S/ are the FK tables for hadronic collider processes. As discussed 
above, they encode all the theoretical inputs introduced in the calculation of a given 
observables. Any variation in these inputs can be included in the fit by generating a new 
FK table, while the rest of the code is left unchanged. 

From the practical point of view, to obtain the FK tables as in Eq. (|15p we first need to 
obtain the partonic weights as in Eq. ([6|) for a given experimental data set, and then com- 
bine these partonic grids with the interpolated PDF evolution coefficients using Eq. (|14p . 
Note that in previous NNPDF fits the evolution and the coefficient functions were not 
combined together in this way. For the data sets considered in this paper, we have used 
the following codes for the partonic weights, as discussed in Sec. E) 
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W" distribution 


[pb] 


W distribution 


[pb] 










FK 


APPLgrid 


frcl 


FK 


APPLgrid 


Ercl 








0.00-0.21 


617.287 


617.345 


0.01% 


456.540 


456.819 


0.06% 








0.21-0.42 


616.988 


617.062 


0.01% 


453.045 


453.315 


0.06% 








0.42-0.63 


620.237 


620.290 


0.01% 


448.902 


449.172 


0.06% 








0.63-0.84 


624.192 


624.235 


0.01% 


441.789 


442.045 


0.06% 








0.84-1.05 


630.235 


630.286 


0.01% 


432.206 


432.435 


0.05% 








1.05-1.37 


636.835 


636.886 


0.01% 


419.027 


419.222 


0.05% 








1.37-1.52 


642.800 


642.861 


0.01% 


403.908 


404.084 


0.04% 








1.52-1.74 


642.499 


642.569 


0.01% 


390.564 


390.724 


0.04% 








1.74-1.95 


642.351 


642.437 


0.01% 


377.328 


377.473 


0.04% 








1.95-2.18 


628.592 


628.693 


0.02% 


359.373 


359.498 


0.03% 








2.18-2.50 


590.961 


591.079 


0.02% 


337.255 


337.366 


0.03% 








Z distribution [pb] 




ATLAS 2010 jets [pb] 


\y\ 


FK 


APPLgrid 




VT (GeV) 


FK 


APPLgrid 




0.0- 


-0.4 


124.634 


124.633 


0.001% 


20-30 




6.1078 X 10'' 


6.1090 X 10*" 


0.02% 


0.4- 


-0.8 


123.478 


123.488 


0.01% 


30-45 




986285 


98654 


0.03% 


0.8- 


-1.2 


121.079 


121.108 


0.02% 


45-60 




190487 


190556 


0.04% 


1.2- 


-1.6 


118.057 


118.108 


0.04% 


60-80 




48008.7 


48029.7 


0.04% 


1.6- 


-2.0 


113.512 


113.549 


0.03% 


80-110 




10706.6 


10710.4 


0.03% 


2.0- 


-2.4 


106.552 


106.562 


0.01% 


110-160 




1822.62 


1822.87 


0.01% 


2.4- 


-2.8 


93.7637 


937.838 


0.02% 


160-210 




303.34 


303.443 


0.03% 


2.8- 


-3.6 


55.8421 


558.538 


0.02% 


210-260 




76.1127 


76.1338 


0.03% 



Table 3: The FK results for some of the LHC data included in the NNPDF2.3 analysis compared 
with the original APPLgrid interfaced to LHAPDF, for the same PDF set. The tables show the 
comparison for the ATLAS differential cross-sections for production, where the average relative 
discrepancy over the whole W/Z data set is 0.03% and the maximum relative discrepancy is 0.06%. 
They also show the corresponding results for some selected bins for the theoretical predictions for 
the ATLAS 2010 jet data in the first rapidity bin \y\ < 0.3, where in this case the average relative 
discrepancy over the whole data set is 0.03% and the maximum relative discrepancy is 0.2%. 

• For the fixed target Drell-Yan data and the Tevatron W and Z data we use the 
FastKernel tables from pLZj. These observables are now calculated using Eq. ()15p . 
giving exactly the same result as previously, but in a fraction of the time. 

• For the Tevatron Run II CDF and DO inclusive jet production we use the tables 
provides by FastNLO. Again these observables are calculated using Eq. (jlSp . giving 
the same result as in previous NNPDF fits but in a fraction of the time. 

• For the ATLAS 2010 inclusive jet data we use the tabulated partonic cross-sections 
from the APPLgrid program. 

• For the LHC electroweak vector boson production data, we have computed new 
APPLgrid partonic cross-section tables, using the built-in interface to the MCFM pro- 
gram. 

Given the crucial role played by the FK tables in our current fitting procedure, a 
careful benchmark of their accuracy is mandatory when they are generated. It is clear 
from Eqs. p4H15p that our prescription for computing the observables is identical to 
the original formula, Eq. ([6]), except that we have changed the order of the sums in 
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order to precompute all the terms that do not depend on the PDFs at the mitial scale. 
Benchmarking the tables is then straightforward, since the observables computed with the 
FK tables must agree with those computed using APPLgrid/FastNLO, provided the two 
procedures use the same PDFs as an input. 

For the hadronic observables that were already in previous NNPDF fits, namely fixed 
target Drell-Yan and Tevatron electroweak boson production, the FK tables have been 
computed on exactly the same grid points in x, and therefore the agreement between 
the old and the new computations is at the level of the machine precision. For the new 
LHC jet and electroweak observables computed using the APPLgrid interface, the grid of 
X points used in the NNPDF analysis is different from the original grids used by the other 
packages. Therefore the comparison is only accurate to the precision of the interpolating 
functions. Results for the ATLAS 2010 jets double differential cross-section [55], and for 
the ATLAS W/Z differential cross-section [56J, computed at a sample of kinematical points, 
are compared in Tab.[3l is clear that the numerical accuracy is more than satisfactory for 
the requirements of precision phenomenology. 

3.3 Improved minimization 

We have introduced some new settings for the minimization, which allow for a more 
extensive exploration of the space of parameters and thus a more accurate result. Some 
of these new settings are computationally more intensive, and thus only made possible by 
the implementation of the FK method described in the previous section. 

In the NNPDF2.1 fits, different parameters were chosen for the genetic algorithm 
at NLO and NNLO: specifically, at NNLO the number of mutations and mutants were 
increased, in order to cope with the more complex shape of NNLO coefficient functions. 
For NNPDF2.3 we use the same parameters (number of mutations and mutation rates) 
at NLO and NNLO, and we choose them to be the same as in NNPDF2.1 NNLO, and 
summarized in Tab. [3 We refer to Sec. 4.2 of Ref. [8], Sec. 4.3 of Ref. [IT] and Sec. 4.2 of 
Ref. |13j for a more detailed discussion of the genetic algorithm and the definition of these 
parameters. As mentioned, this choice corresponds to an increased number of mutants 
and mutations, and thus a more detailed exploration of parameter space. These new 
parameters of the genetic algorithm for the NNPDF2.3 fits are collected in Tab. HI 

Also, we have modified the criterion for dynamical stopping by making it a little more 
stringent, which means that stopping happens on average at a somewhat later stage: we 
take Tt, — 1 = 4 X 10~^ at NLO and — 1 = 3 x 10~^ at NNLO, to be compared with the 
respective values r„ — 1 = 3 x 10^^ and r„ — 1 = 2 x 10~^ of the NNPDF2.1 fits, discussed 
in Sec. 4.6 of Ref. [TT] (NNPDF2.0 NLO, unchanged in NNPDF2.1) and Sec. 4.2 of Ref. [8] 
(NNPDF2.1 NNLO). We have also increased the maximum number of genetic algorithm 
generations at which the minimization stops if the stopping criterion is not satisfied, from 
Armax ^ 3 X of NNPDF2.1 to A^™^^ = 5 x lO''. These new values of the stopping 
parameters have been determined, as discussed in Ref. (see in particular Sect. 4.6 and 
Fig. 8) by inspection of the minimization profiles for individual replicas, with the new 
dataset and minimization parameters used now. Clearly both the increased number of 
mutants and mutations, and the increase by almost a factor two of the maximum training 
length are computationally quite demanding. 

We have also introduced two more small improvements of the minimization procedure. 
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^ ' gen 








^ ' mut 


^ 'mut 


2.1 NLO 


10000 


2500 


30000 


2.6 


80 


10 


2.1 NNLO 


10000 


2500 


30000 


2.3 


80 


30 


2.3 NLO 


10000 


2500 


50000 


2.3 


80 


30 


2.3 NNLO 


10000 


2500 


50000 


2.3 


80 


30 





2.1 NLO 


2.1 NNLO and 2.3 


PDF 


-^mut 




-^mut 






2 


10, 1 


2 


10, 1 


9{x) 


2 


10, 1 


3 


10, 3, 0.4 


Tsix) 


2 


1, 0.1 


2 


1, 0.1 


Vix) 


2 


1, 0.1 


3 


8, 1, 0.1 


As{x) 


2 


1, 0.1 


3 


5, 1, 0.1 


s~^{x) 


2 


5, 0.5 


2 


5, 0.5 


s-{x) 


2 


1, 0.1 


2 


1, 0.1 



Table 4: Parameter values for the genetic algorithm for NNPDF2.3 fits, compared to NNPDF2.1 
NLO and NNLO (top). The number of mutations and values of the mutation rates for each 
individual PDF are also given (bottom). 

First, we now discard outlier replicas such that their value of x^^'^^ is more than four sigma 
larger than the mean value evaluated over the replica sample (see Sec. 14.11 and Tab. S]): 
such replicas are exceedingly unlikely in the iVrep = 100 replica samples that we consider 
here, and their inclusion would bias results. 

Second, for experiments with a small number of data points we include all data in 
the training set, rather than equally dividing them between training and validation sets. 
Indeed fit results are quite stable upon changes of the value of the training fraction, 
provided the sample of training and validation data are large enough to be representative 
of the full dataset, as shown in Ref. [13], where the size of the training fraction was varied 
by a factor two with essentially unchanged results. Experiments with a small number 
of data points have little or no impact on the fulfillment of the stopping criteria, and 
it is then advantageous to include all their points in the training sample in order to 
maximise the information which is extracted from these data. In practice, we include in 
the training set all the data points for all experiments with up to 30 data points: HI F/,, 
CDF W asymmetry, CDF and DO Z rapidity distributions, ATLAS W and Z data, CMS 
W electron asymmetry and LHCb W and Z data. 
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NNPDF2.3 




NLO 


NNLO 


^2 

Xtot 


1.122 


1.139 


{E) ± GE 


2.17 ± 0.05 


2.19 ± 0.07 


{Etr) ± aE,, 


2.15 ± 0.07 


2.17 ± 0.08 




2.20 ± 0.07 


2.24 ± 0.10 


(TL) ± CTTL 


(24 ± 16) 10^ 


(22 ± 15) 10^ 




1.18 ± 0.03 


1.21 ± 0.04 




12.1 


12.2 




3.0 


3.0 




0.18 


0.18 




0.40 


0.49 



Table 5: Table of statistical estimators for the NNPDF2.3 NLO and NNLO fits with 7V,.op = 100 
replicas. 

4 Results 

The main results of this paper are the NNPDF2.3 NLO and NNLO parton distributions. 
In this section we will discuss the statistical features of the corresponding fits, then present 
the NNPDF2.3 PDFs and compare them to other available NLO and NNLO sets. We have 
produced NNPDF2.3 PDFs for all values of from 0.114 to 0.124 in steps of 0.001. We 
have also produced PDF sets in which the number of active flavors does not increase beyond 
Nf = 4 or A'^^ = 5, which are loosely referred to as "fixed flavor number" (even though, 
strictly speaking they are "maximum flavor number" sets). These are provided for all 
values of Os from 0.116 to 0.120 in steps of 0.001. A more extensive selection of plots than 
that presented here is available from the NNPDF web site, http : //nnpdf . hepf orge . org/, 

In order to ease the comparison with NNPDF2.1 and gauge the impact of LHC data, 
we have also produced NLO and NNLO sets based on exactly the same data set used for 
NNPDF2.1, but including various methodological improvements (such as those discussed 
in Sec. 13. 3p . which we call NNPDF2.3 noLHC: these are provided for values of from 
0.116 to 0.120 in steps of 0.001. Finally, in order to better elucidate the compatibility 
of collider (including LHC) data with low-energy data, we have also produced NLO and 
NNLO sets only based on collider data, which we call NNPDF2.3 collider, also provided 
for all values of from 0.116 to 0.120 varied in steps of 0.001. These will be discussed in 
more detail in Sec. [SJ In order to check the consistency of the LHC data with the rest of 
the data included in the fit, we have also produced extended sets of NNPDF2.3 noLHC 
NLO and NNLO PDFs including 500 replicas, which can be used in order to perform PDF 
determinations in which LHC data are included by reweighting an existing set, according 
to the methodology of Refs. t^lE], to be discussed in Sec. 15. 1[ 

All tables and plots in this section will be produced using the PDF sets which corre- 
spond to the value as{Mz) = 0.119 (consistent with the current PDG [91j value as{Mz) = 
0.1184± 0.0007) for ease of comparison with previous NNPDF papers. A determination of 
as based on the NNPDF2.1 parton set was performed both at NLO [92] and NNLO [93] . 
However, the quality of the PDF fit is quite good for all values of considered here, and 



17 





NNPDF2.1 


NNPDF2.3 




Global 


Global Fit 


Global RW 


noLHC 


Collider 


ExpcrirQcnt 


1\TT r\ 


1\TMT O 




IN IN ijij 




IN IN ijU 


IN Li(J 


MMT 


MT O 

IN J_jU 


1N.TMT 


Total 




1. iOZ 






1. iUO 


1 1 "^Q 

i . loy 


1 . lUl 


1. 14Z 


n Q71 


n 00*^ 
u.yyo 


NMC-nri 


0.97 


0.93 


0.95 


0.95 


0.93 


0.93 


0.93 


0.94 


[5.33] 


rc; 1 Ql 
[5.13] 


NMC 


1.68 


1.58 


1.61 


1.59 


1.62 


1.57 


1.59 


1.56 


[1.89] 


[1.83] 


SLAC 


1.34 


1.04 


1.24 


1.00 


1.27 


1.01 


1.28 


1.04 


[1.72] 


[1.41] 


BCDMS 


1 oi 
L.Zl 


1 on 


1 on 

i.ZU 


1 oc 


1 on 


1 OQ 
l.ZO 


1 on 

i.ZU 


1 OQ 
i.ZO 


[1.85J 


[2.15] 


CHORUS 


1 1 n 


1 HQ 


1 1 n 


1 nv 
i.U ( 


1 1 n 


l.Uo 


1 nn 

i.uy 


1 nv 
i.U ( 


[1.73] 


[1.70] 


NTVDMN 


u. / u 


u.ou 


U.40 


u.ou 


42 


U.Ol 


42 


U.40 


[OK «nl 
[zu.uyj 


[Zl . lOJ 


HERAI-AV 


1.04 


1.04 


1.00 


1.01 


1.00 


1.02 


1.01 


1.03 


0.97 


0.99 
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1.35 
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1.21 
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1.57 


1.52 
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1.33 


1.30 


DYE605 
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0.60 
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0.60 
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DO RH cone 
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0.84 


0.94 


0.84 


0.93 


0.84 
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[1.06] 


[0.95] 


1.00 


0.94 


1.00 


0.92 
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Table 6: The values per data point for individual experiments computed using the default 
NNPDF2.1 NLO and NNLO PDF and various NNPDF2.3 PDF sets. AU values have been 
obtained using A^i.ep=100 replicas with as{Mz) = 0.119. Normalization uncertainties have been 
included using the experimental covariance matrix (note that in Tab.|3]the |16j covariance matrix 
for normalization uncertainties is used). Values in square brackets correspond to experiments not 
included in the corresponding fit: these are not included in the total x^- 

we see no indication of pathological behaviour of PDFs for any value of Og- As a conse- 
quence, the user can utilize the PDF set corresponding to the value of Og of their choice. 
Combined POF+a^ uncertainties may be determined by combining replicas from sets cor- 
responding to different values of Us, as discussed in Sec. 3.2 of Ref. [M]. As a default, the 
current PDG value and uncertainty may be used unless there are reasons to do otherwise: 
this value is obtained by combining determinations of Os at various perturbative orders, 
and it is thus meant to be appropriate both at NLO and NNLO. 

4.1 Statistical features 

In Tab. Owe summarize the statistical estimators for the NNPDF2.3 NLO and NNLO fits 
with A'^rep = 100 replicas. A detailed discussion of statistical indicators and their meaning 
can be found in Refs. [71ll6pi7p95j : here we merely recall that Xtot is computed by comparing 
the central (average) NNPDF2.3 fit to the original experimental data, (x^^'^^) is computed 
by comparing each NNPDF2.3 replica to the data and averaging over replicas, while {E) 
is the quantity that is actually minimized, i.e. it coincides with the computed by 
comparing each NNPDF2.3 replica to the data replica it is fitted to, with the three values 
given corresponding to the total, training, and validation data sets. All these estimators, 
including in particular the ^ire normalized to the relevant number of data points. 
When comparing two different fits we will also show distances between central values and 
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uncertainties, computed using sets of A'rcp = 100 replicas: in this respect, recall that two 
points extracted from distributions that differ by one standard deviation have an average 
distance d = 10, while two points extracted from the same distribution have an average 
distance d = 1 (the difference being due to the fact that the standard deviation of the 
mean is by a factor y/N^-ep smaller than the standard deviation of the distribution). 

In Tab. El we compare the Xtot for NNPDF2.1 and NNPDF2.3 PDF sets both at NLO 
and at NNLO. The of each data set is also shown. For the NNPDF2.3 fit, we also show 
the values for the noLHC and the collider only fits, and the obtained by reweighting the 
NNPDF2.3 noLHC PDF sets with the new LHC data, using 500 rephcas. The noLHC sets 
are discussed in detail in Sec. 14.31 and also Sec. 15.11 where they are also used to construct 
reweighted sets, while the collider only PDF sets are discussed in Sec. 15.21 

It should be noticed that all the the statistical estimators of Tab. \5\ and specifically 
the are determined using the to method for the treatment of normalization uncertain- 
ties [16], while all x^ values in Tab. [6] are computed using the experimental covariance 
matrix: the former is needed for unbiased minimization, while the latter yields a measure 
of the goodness-of-fit. The to method for fitting normalization uncertainties leads to sta- 
tistically unbiased results, and it does not require the ad-hoc use of quartic penalties in 
the treatment of normalization uncertainties as often required when these are treated by 
means of an offset method (see e.g. Ref. [96j). However, recent benchmarking exercises 
suggest that in the context of the existing global fits results obtained the to and offset 
method are very close to each other |97) . 

We have checked that the difference between the two values (to and experimental 
covariance matrix) is of the order of the expected statistical fluctuation of the total 
(i.e. of order of 1 / V-^dat for the x^ pei" data point as given here). It should be noticed 
that the x^ values for the NNPDF2.1 NLO and NNLO fits differ from those given m 
Tab. 4 of [7] and Tab. 6 of [5] respectively, both because the latter were given using the 
to definition of the covariance matrix, and also because they were computed using a set 
of 1000 replicas, while only 100 replica sets are used for all sets of Tab. [6j Finally, the 
value reported in Refs. [8l[T6j for NTVDMN (NuTeV dimuons) was affected by an error, 
to be discussed in Eq. (fT6|) below. 

In Tab. [7] we compare the average uncertainties (c'''^^^^)(jat experimental data, 

for each separate data set, to the average uncertainties (c*-°*^*^)dat predictions for 

those data due to PDF uncertainties, obtained using each of the various PDF sets. Clearly 
these are rather smaller than the experimental uncertainties, due to the extra information 
coming from the other data sets, but it is interesting to see how they compare between 
NLO and NNLO, and between different fits. 

The distribution of x^^''\ e[^\ and training lengths among the 100 NNPDF2.3 NLO 
and NNLO replicas are shown in Fig. [2] and Fig. [3] respectively. While most of the replicas 
fulfil the stopping criterion, a fraction (~ 20%) of them stops at the maximum training 
length A'^^^^ which has been introduced in order to avoid unacceptably long fits. As in 
previous PDF determinations, we have explicitly verified that if we were to discard all 
replicas that do not stop dynamically, PDFs change by an amount which is smaller than 
a typical statistical fiuctuation. 
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Table 7: The average percentage value of the experimental uncertainty {^'"'^^'^'^) and of the PDF 
uncertainty (cr("'=*))^^^ for each data set, for all the NNPDF2.1 and NNPDF2.3 NLO and NNLO 
PDF sets. All the values have been obtained including normalization uncertainties 

using the experimental covariance matrix. Values in square brackets correspond to experiments 
not included in the corresponding fit. 
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Figure 2: Distribution of x^'^^^ (upper plots) and Sj^. (lower plots), over the sample of 
replicas, for the NNPDF2.3 NLO (left plots) and NNLO (right plots) PDF sets. 
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Figure 3: Distribution of training lengths over the sample of N^^p = 100 replicas for the NNPDF2.3 
NLO (left plot) and NNPDF2.3 NNLO (right plot) PDF sets. 



4.2 NNPDF2.3 parton distributions 

The NNPDF2.3 NLO and NNLO PDFs are shown, along with the corresponding PDFs 
from NNPDF2.1, in Figs. |1&[5] (NLO) and Figs. [6] & [7] (NNLO). It is clear that aU PDFs 
from the two sets differ by less, and usually much less, than one sigma, with differences 
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Figure 4: NLO NNPDF2.3 singlet sector PDFs at = 2 GeV^, compared to tlieir NNPDF2.1 
counterparts, computed using iVrop = 100 replicas from both sets. All error bands shown corre- 
spond to a one sigma interval. 

being generally smaller at NNLO. 

A more accurate assessment of the difference between the NNPDF2.1 and NNPDF2.3 
sets can be obtained by looking at the distance between the two sets, shown in Figs. [8]& 
[9l The largest changes are in the gluon and in the valence flavor decomposition (i.e. in 
the sea asymmetry, triplet and strangeness), where we would indeed expect jet and gauge 
boson production to have some impact. The ratio of the NNPDF2.1 to NNPDF2.3 NNLO 
PDFs at = 10^ GeV'^ is shown for the gluon, singlet, triplet ans strangeness in Fig. IIOI 
The origin of these differences is addressed in detail in Sec. 14.31 

4.3 Detailed comparison to NNPDF2.1 

The NNPDF2.3 sets differ from the NNPDF2.1 ones not only because of the addition 
of LHC data, but also due to the improvements in the neural network training proce- 
dure presented in Sec. El and finally because of the correction of an error in Eq. (33) of 
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Figure 5: Same as Fig. g] for tlie nonsinglet sector NLO PDFs. 
Ref. [70. This error only affects the NuTeV dimuon cross-sections, which in turn only 



*Tlie correct equation should read 



(x,y,Q ) = 



GIMn 



27r(l + QVM2 



(16) 



In Eq. (33) of Ref. [7] there is a spurious factor of yl + '^j- This is the same expression as Eq. (1) of 

Ref. [15]: in that reference, a so-called improved zero-mass variable-flavor number scheme is used, and 
this factor provides the desired improvement. But in Ref. [7] a general mass scheme is used, in which this 
factor is unnecessary and thus spurious given that the charm mass is treated exactly. 
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Figure 6: Same as Fig. Hbut at NNLO. 



have a significant effect on the strange distribution. 

In order to isolate the effect of each of these changes, we have performed an NNPDF2.3 
fit without LHC data, i.e. with the same data set as NNPDF2.1, but with the method- 
ological improvements discussed in the previous section, supplemented by the correction 
of the error in the dimuon cross-section. 

We first determine distances between NNPDF2.1 and NNPDF2.3 noLHC: this mea- 
sures the effect of the various improvements, with fixed data set. The distances are shown 
in Fig. [11] (at NLO) and [12] (at NNLO). The largest distances are observed between the 
NLO strange, gluon, and sea asymmetry PDFs, for which a direct comparison between 
NNPDF2.1 and NNPDF2.3 noLHC is shown in Fig. [13 

At NNLO the distances for essentially all PDFs except total strangeness are compatible 
with purely statistical fluctuations (d ~ 1 corresponds to statistically equivalent fits). 
Strangeness (also shown in Fig. [T3|) changes in a statistically significant way, though at 
most by about half sigma, in the 10-2 < X < 10-^ range. We have checked that if the 
error in Eq. ()16p is corrected with everything else left unchanged, then the distance in 
strangeness is somewhat smaller, but roughly of the same order: hence the change in 
strangeness is mostly due to the correction of this error. It is interesting to observe that, 
despite the fact that the change in each individual PDF is statistically insignificant, the 
improvement quality of the global fit is still significant: the per data point decreases 
from 1.167 to 1.147, corresponding to an decrease by about 70 units of the total of the 
fit (which corresponds to a decrease of the by slightly more than one sigma). Note that 
this decrease is not due to the NuTeV dimuon data, whose only decreases by a couple 
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Figure 7: Same as Fig.[5]but at NNLO. 



units, and thus it must be attributed to the improved minimization. Because at NNLO 
this only amounts to more stringent stopping and higher maximum number of iterations, 
we must conclude that the higher 'x^ value in the NNPDF2.1 NNLO fit was due to slight 
underlearning. 

The improvement in fit quality is even more marked at NLO, where essentially all 
PDFs undergo changes at the half sigma level, and the decreases by about 150 units 
(i.e. about two sigma). The fact that more significant changes are observed in PDF shapes 
at NLO can be understood as a consequence of having increased the number of mutations 
and mutants in this case. 

Next we compare the NNPDF2.3 noLHC and the NNPDF2.3 fits at NLO and NNLO, 
in order to gauge the genuine impact of LHC data. The distances between these two fits 
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NNPDF2.1 NLO vs NNPDF2.3 NLO 



Central Value Central Value 




Figure 8: Distances between NNPDF2.1 and NNPDF2.3 NLO. 



are shown in Fig. [T3]and[T21 The largest distances in central values are observed between 
the NNLO total strangeness and quark singlet at small x PDFs, and to a lesser extent the 
gluon for all of which a direct comparison between NNPDF2.1 and NNPDF2.3 noLHC is 
shown in Fig. [T6l 

The impact of LHC data is clearly moderate, with no distance larger than four at 
NLO, which means that the fitted PDFs differ by less than half sigma. At NNLO the 
effect seems to be a bit larger. This confirms the consistency of the PDFs extracted from 
lower energy experiments with the PDFs extracted from LHC data. The description of the 
LHC experiments is of course better in NNPDF2.3 than in NNPDF2.3 noLHC, although 
the starting agreement was already very reasonable, as is clearly seen in the comparison 
shown in Tab. [6j 

While all changes are moderate, the main effect of the LHC data is to lead to somewhat 
smaller gluon uncertainties, thanks to the inclusion of the ATLAS jet data, and especially 
a more accurate light quark flavor decomposition thanks to the LHC electroweak vector 
boson production data. This in turn leads to an improvement in the accuracy of standard 
candle cross-sections, both dependent on the gluon (top production) and the quark (gauge 
boson production), as we will see in Sec. 16.21 
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NNPDF2.1 NNLO vs NNPDF2.3 NNLO 
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Figure 9: Distances between NNPDF2.1 and NNPDF2.3 NNLO. 
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NNPDF2.1 NLO vs NNPDF2.3 noLHC NLO 
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Figure 11: Distances between NNPDF2.3 noLHC and NNPDF2.1 NLO. 



NNPDF2.1 NNLO vs NNPDF2.3 noLHC NNLO 

Central Value Central Value 




Figure 12: Distances between NNPDF2.3 noLHC and NNPDF2.1 NNLO. 
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Figure 13: Comparison of some PDFs from the NNPDF2.1 and NNPDF2.3 noLHC sets: top: 
NLO small x gluon (left), d — u (right); middle: NLO s~ = s — s small x (left), large x (right) at 
Q2 = 2 GeV2; bottom small a; s+ = s + s at NLO (left) and NNLO (right). 
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NNPDF2.3 noLHC NLO vs NNPDF2.3 NLO 
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Figure 14: Distances between NNPDF2.3 noLHC and NNPDF2.3 NLO. 



NNPDF2.3 noLHC NNLO vs NNPDF2.3 NNLO 
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Figure 15: Distances between NNPDF2.3 noLHC and NNPDF2.3 NNLO. 
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Figure 16: Comparison of some PDFs from tlie NNPDF2.3 noLHC and NNPDF2.3 NNLO sets. 
Top: gluon at small x (left) and large x (right); bottom: small x total singlet (left) and total 
strangeness (right). 
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5 The LHC data 



We will now address in more detail the impact of LHC data on NNPDF2.3 parton dis- 
tributions. First, we extend the discussion of Sec. 14. 3^ where NNPDF2.3 fits with and 
without LHC data were compared, by comparing a fit in which LHC data are included in 
the full fitting data set with a fit in which the LHC data are included by reweighting a fit 
which does not include them: this enables us to assess more accurately both the impact 
and the consistency of the LHC data in the global fit. This analysis is supplemented by an 
explicit assessment of the quality of the fit to LHC data before and after their inclusion. 
We then consider a PDF determination which is based on collider data only. This enables 
us to discuss the consistency between collider data and the lower-energy fixed-target data, 
several of which are affected by nuclear corrections. The LHC data, by expanding the set 
of available collider data, sheds some further light on this comparison. Finally, we present 
a dedicated analysis of the strangeness fraction of the nucleon, and the impact of LHC 
data on its determination. 

5.1 Impact and consistency of the LHC data in the global fit 

New data can be included in an existing PDF fit either by simply redoing the fit, or by 
reweighting the replicas of an existing fillf] [11[6]. Besides providing a strong consistency 
check of the fitting procedure, the inclusion of the data by reweighting allows one to test 
for the impact and consistency of the new data. For example, one can determine the 
probability distribution P{a) for the rescaling of uncertainties of the new data set by a 
factor a: for fully consistent data the mean value of a should be (a) ~ 1. Furthermore, 
one may determine the effective number of replicas N^q left after reweighting the initial set 
of N^ep replicas: if A'efr <C -A'rep the new data either bring in considerable new information 
or they are very inconsistent with the pre-existing data, and conversely. 

We have thus performed the reweighting of a set of 500 NNPDF2.3 noLHC NLO or 
NNLO replicas with as{Mz) = 0.119 with the various LHC data sets. The obtained 
comparing to the data predictions obtained using this reweighted set are compared in 
Tab.Elto those of the standard NNPDF2.3 NLO and NNLO fits. The good agreement be- 
tween reweighted and refitted values provides evidence for the consistency and efficiency 
of the fitting methodology. Furthermore, in Fig. [T7]we show the probability distributions 
P{a) for each of the four LHC data sets added to the fit, at NLO and NNLO, and in 
Tab. [8] the values of (a) and Nes obtained by reweighting at NLO or NNLO first with 
each LHC data set individually, and then with all LHC data. The values of P{a) dis- 
tributions generally show good consistency of the LHC data with the rest, with perhaps 
a marginal inconsistency for the ATLAS and CMS gauge boson production. This might 
suggest some tension between the flavor decomposition favoured by low energy and col- 
lider data, to which we will return in Sec. 15. 2i The values of N^s show that even though 
the impact of the LHC data is moderate, it is not negligible: for example, the ATLAS 

^The possibility of fitting PDFs to new data via reweighting was originally suggested in Refs. [981199) . 
However, the expression for the weights given in these references was incorrect due to an argument, ex- 
plained in Sect. 2.3 of Ref. [B], related to the so-called Borel-Kolmogorov paradox of probability theory [lUOj . 
Use of the formula for the weight of Ref. '981 generally leads to all the normalized weights but one to be 
vanishingly small (i.e. only one replica survives reweighting). The reweighting method of Ref. was 
recently succesfuUy used in Ref. [SS] in fhe context of the MSTW PDF fits. 
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NNPDF2.3 noLHC reweighted with LHC data 
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Table 8: The effective number of replicas N^e and average uncertainty rescaling (a) (defined in 
Refs. [1[6]) obtained by reweighting a set of N.^p = 500 NNPDF2.3 noLHC NLO and NNLO 
replicas with as{Mz) = 0.119 with each of the LHC data sets, and with all of them. 

gauge boson production data alone effectively discard more than two thirds of the starting 
replicas. The impact of the LHC data is clearly seen in Tab. [71 the uncertainty in the 
prediction for gauge boson production at the LHC is roughly halved if NNPDF2.3 PDFs 
rather than NNPDF2.3 noLHC PDFs are used. 



P{x'\a) ATLAS 201 Jets Pd'M ATLAS 201 W,Z 




Figure 17: The P (a) distributions for each of the LHC experiments included in NNPDF2.3: the 
ATLAS 2010 jets, the ATLAS 2010 W,Z data, the CMS 2011 W electron asymmetry and the LHCb 
2010 W data, all determined using either NLO or NNLO theory, with as(A/z) = 0.119. The mean 
values of a computed from each of these distributions are given in Tab. [S] 

The impact of the LHC data can also be seen by comparing the predictions for the fitted 
observables before and after their inclusion in the fit: as shown in Sec. 14.31 the values 
of Tab. E] already show that the fit quality is acceptable before including the LHC data, 
and quite good after including them. This is also seen from a data-theory comparison: 
the predictions from the NNPDF2.1 and NNPDF2.3 as{Mz) = 0.119 sets for the various 
LHC observables are compared to the data in Fig. [18] (ATLAS jets), Fig. [19] (ATLAS and 
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CMS W,Z production) and Fig. [20] (LHCb W production). For the LHC electroweak data 
we show the predictions from the NNLO PDF sets, while for the ATLAS inclusive jets 
we show the corresponding NLO predictions: for jets the results are normalized to the 
NNPDF2.3 prediction. Note that much of the uncertainty in the jet data is the totally 
correlated normalization uncertainty, which can shift the entire data set up or down, so 
that inspection of the plot can be somewhat misleading and the values of Tab. [6] should 
be checked in order to assess fit quality. 
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Figure 18: Comparison of the ATLAS inclusive jet data with predictions from NNPDF2.1 and 
NNPDF2.3 NLO PDFs with ^^(Mz) = 0.119. We show the ratio of data over theory, normalized 
to NNPDF2.3, divided into rapidity bins. The experimental error bars are statistical, while the 
(correlated) systematic uncertainty, including normalization errors, is shown as a band in the 
bottom of each plot. 
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Figure 19: Same as Fig. [TSJ but for ATLAS and CMS gauge boson production data, now usin 
NNLO PDFs. 



LHCb W* muon pseudorapidity distribution LHCb W muon pseudorapidity distribution 




Figure 20: Same as Fig. [181 but for LHCb electroweak gauge boson production data, now usin 
NNLO PDFs. 



36 



5.2 Consistency of LHC data with low-energy data 



It has been previously observed [6l ll01l[T02] that there seems to be some tension between 
the flavor decomposition favoured by low energy data, and that obtained from Tevatron W 
production data. The LHC data will eventually solve any such discrepancy: in fact, they 
already shed some light on this issue. To investigate this, as already mentioned, we have 
constructed sets of NLO and NNLO NNPDF2.3 colhder PDFs. The NNPDF2.3 colhder 
PDFs are based on a data set which includes only the HERA-I inclusive data, the ZEUS 
HERA-II data and the HI and ZEUS -F| charm structure function data, the W and Z 
production data from the Tevatron and the LHC, the CDF and DO Run 11 inclusive jet 
production data and the ATLAS inclusive jet data. This reduces the size of the data set 
from about 3500 to about 1200 data points (see Tab. [2|). 

The distance between PDFs in the NNPDF2.3 colhder and NNPDF2.3 default data 
sets at NLO and NNLO with as{Mz) = 0.119 are shown in Figs. [2TH22] respectively, while 
the distances between NNPDF2.3 collider and NNPDF2.3 noLHC PDFs are in Figs. [H- 
[211 We note that almost all PDFs change at the one or two sigma level, both at NLO and 
NNLO. There are no significant differences between the pattern of distances observed at 
NLO or NNLO, or when comparing to the standard or noLHC sets. This in particular 
suggests that the tension between collider data and low energy data, if it exists, is only 
mild: indeed, if the low energy data were inconsistent with collider data, distances should 
be significantly smaller when comparing to the noLHC set than to the default set, because 
the former has much fewer collider data. 

To investigate this in more detail, we have sampled the distances shown in Figs. \2T\ - 
fM\ for each PDF at 100 points in x, 50 equally spaced on a log scale from x = 10~^ to 
10~^, and 50 more on a linear scale from 0.1 to 0.9. For each comparison, we have then 
produced a histogram of the distribution of distances. The distance is defined for each x 
value as a sum over replicas of normalized square differences of predictions obtained from a 
gaussian distribution, and thus it should follow, for each point and PDF, a distribution 
with one degree of freedom. The combined histogram still follows the same distribution 
if correlations are uniform. Indeed only if there is a large number of points that are 
correlated to one particular point will the histogram be distorted. We have verified that 
this is not the case by checking that the histograms do not change when redone with a 
decreasing number of points, which completely modifies the pattern of correlations. 

The normalized histograms are compared to this distribution in Fig. [25l no sig- 
nificant difference is apparent between the four histograms, which are all well consistent 
with the theoretical distribution. This suggests that the differences between PDFs in the 
collider only and global fits are consistent with a purely statistical distribution, based on 
the given PDF uncertainties. This means that, the different behaviour of 2.3 and 2.3 col- 
lider PDFs which is seen in Fig. [26] (e.g. for the triplet at small x) is compatible with a 
statistical fluctuation. 

Based on these conclusions, one might be tempted to recommend usage of the collider 
PDFs, in that they are free from nuclear corrections and much less sensitive to possible 
higher twist corrections, while retaining the abundant, statistically accurate, and theoret- 
ically very reliable deep-inelastic data from the HERA experiments. This option however 
turns out to be not viable at present, because the collider PDFs still have rather large sta- 
tistical uncertainties. This is apparent from Tab. [3 where it is seen that the uncertainties 
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NNPDF2.3 collider NLO vs NNPDF2.3 NLO 

Central Value Central Value 




Figure 21: Distances between PDFs from tlie NNPDF2.3 collider and NNPDF2.3 NLO sets. 



of all the observables measured by fixed-target experiments become unacceptably large 
when the NNPDF2.3 collider fit is used. It is also visible in Fig. [26l where we compare 
the singlet, gluon triplet and sea asymmetry in the collider and default NNLO fits. This 
means that a collider-only fit is presently not viable, though this situation may change 
for future collider PDFs, which will include both the final, yet unpublisjed, HERA-II 
combined deep-inelastic data, as well as future LHC data. 
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NNPDF2.3 collider NNLO vs NNPDF2.3 NNLO 

Central Value Central Value 




Figure 22: Distances between PDFs from tlie NNPDF2.3 collider and NNPDF2.3 NNLO sets. 



NNPDF2.3 noLHC NLO vs NNPDF2.3 collider NLO 
Central Value Central Value 




Figure 23: Distances between PDFs from the NNPDF2.3 collider and NNPDF2.3 noLHC NLO 
sets. 
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NNPDF2.3 noLHC NNLO vs NNPDF2.3 collider NNLO 
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Figure 24: Distances between PDFs from the NNPDF2.3 collider and NNPDF2.3 noLHC NLO 
sets. 



Distribution of d/ VN,.„ for Central Values 



Distribution of d/ VN,,„ for Central Values 



I I 2.3 Global vs Collider, NLO 
Distribution for 1 DOF 




Distribution of d/ VN,.„ for Central Values 



I I 2.3 noLHC vs Collider, NLO 
Distribution for 1 DOF 




I I 2.3 Global vs Collider, NNLO 
Distribution for 1 DOF 




Distribution of d/ VN,,„ for Central Values 



I I 2.3 noLHCvs Collider, NNLO - 
Distribution for 1 DOF 




Figure 25: Distribution of distances between PDFs from the NNPDF2.3 cohider and NNPDF2.3 
NLO and NNLO sets (top) and between PDFs from the NNPDF2.3 colhder and NNPDF2.3 noLHC 
NLO and NNLO sets (bottom). The distribution with one degree of freedom is also shown. 
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X 



Figure 26: Comparison of the singlet, gluon (top) , triplet u + u — d — d and sea asymmetry d—u 
at Q2 ^ 2 GeV^ from the NNLO NNPDF2.3 cohider and NNPDF2.3 sets with a^(Mz) = 0.119. 
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5.3 The strangeness fraction of the proton 



The ATLAS collaboration has recently [23] presented evidence that the strange quark 
distribution at low and x is rather larger than hitherto thought, so that, for the specific 
kinematics probed by ATLAS (Q^ = 1.9 GeV^, x = 0.023) the quark sea is essentially 
symmetric. This result, if correct with the stated uncertainty, disagrees at the two sigma 
level with the result obtained using the NNPDF2.1 set. The ATLAS analysis is based 
on combining ATLAS and HERA data, i.e. on a subset of the data used to construct 
the NNPDF2.3 collider fit discussed in Sec. 15.21 as we have seen there, PDFs based on 
collider data only have very large uncertainties, so such a discrepancy seems surprising. 
This is especially so given the fact that the strangeness distributions in the NNPDF2.1 
and NNPDF2.3 sets (the latter including ATLAS data) always agree at the one sigma 
level, as can be seen from Fig. [TUJ 

In order to check the ATLAS result, we have produced a version of the NNPDF2.3 fit 
based on exactly the same data set, namely only the combined HERA-I data (HERAT 
AV in Tab. 2 of [7]) and the ATLAS gauge production data (ATLAS and ATLAS 
Z in Tab. [1]), with a single value of as(Mz) = 0.119. This PDF set will be denoted 
in the following as NNPDF2.3 HERA+ATLASWZ. Following Ref. [231, we define the 
a;-dependent strangeness fraction 

A perhaps more significant measure of the strangeness content is the strangeness momen- 
tum fraction normalized to the light sea momentum fraction |15 pi03fjl05] 

^^(g2) ^ Ijdxx {s{x,Q^) + s{x,Q^)) ^^^^ 
/J dxx {u{x, Q2) -I- (l(^x, Q"^)) 

traditionally [ 106 ! taken to be Ks ~ 0.5 at scales of a few GeV. 

The strangeness fraction rs{x,Q'^) Eq. ^ was determined in Ref. |24) at the two 
points in the (x,Q2) plane (x,Q2) = (0.023,1.9 GeV^) and (x,Q2) = (0.013, M|). In 
Fig. [57]we show rg{x, Q^) computed as a function of x for the relevant scales, as obtained 
using the NNPDF2.3 noLHC, NNPDF2.3 and NNPDF2.3 HERA+ATLASWZ NNLO 
PDF sets with as(Mz) = 0.119. It is clear that the strangeness fraction computed with 
NNPDF2.3 PDFs, especially in the range 10~^ x < 10~^, is somewhat larger than 
the NNPDF2.3 noLHC one, though they are fully consistent at the one sigma level for 
all values of x. This means that even though the ATLAS data do push the strangeness 
fraction towards slightly higher values, the effect is of marginal statistical significance. If 
the comparison is made with the NNPDF2.3 HERA+ATLASWZ fits, then uncertainties 
are so large that results become completely compatible. 

The values of the strangeness fraction at the specific values of (x, Q^), together with the 
strangeness momentum fraction Eq. (jlSp are given in Tab. [9l and represented graphically 
in Fig. [551 The values of the strangeness fraction from Ref. [24j are also shown (note that 
in the latter case the low-scale value is given at Q2 = 1.9 GeV^, while for NNPDF it is at 
Q2 = 2 GeV2.) 

The conclusions are the same as were drawn from Fig. [27J the ATLAS data favour 
a somewhat larger central value of the strangeness fraction, which remains however com- 
patible at the one sigma level with the value obtained without LHC data (NNPDF2.3 
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r^{x, = 2.0 GeV^) r,(x, = M^) 




Figure 27: The strangeness fraction rs{x,Q^) Eq. ([T71) computed as a function of x for = 
2 GeV^ (left) and = Ml GeV^ (right) from NNPDF2.3 noLHC, NNPDF2.3 and NNPDF2.3 
HERA+ATLASWZ NNLO PDFs with q^(Mz) = 0.119. Ah error bands are one sigma. 



noLHC), and also consistent with the previous, slightly less accurate NNPDF2.1 value. 
However, it seems that the "standard" belief that the strange momentum fraction is of 
order Kg ^ \is still essentially correct. 

If one then attempts a determination based on HERA and ATLAS data only, then un- 
certainties are so large that no conclusion can be drawn: the NNPDF2.3 HERA+ATLASWZ 
result for has an uncertainty which is three to four times bigger than that of the ATLAS 
result of Ref. |24) . and indeed for this PDF set Ks is essentially undetermined. Because 
the ATLAS analysis is based on the same data, and only differs from our analysis in the 
fitting methodology (in particular, the use of a rather simple functional form for PDFs), 
it appears likely that the uncertainties in the results of Ref. [24J are significantly underes- 
timated. 
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Figure 28: Graphical representation of the results of Tab. [S] Note the different scale on the x 
axis. 
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PDF Set 


(0.023, 2 GeV-^) 


r,(0.013,M|) 


NNPDF2.1 
NNPDF2.3 noLHC 
NNPDF2.3 


0.28 ± 0.09 
0.39 ±0.10 
0.43 ±0.11 


0.71 ±0.05 
0.76 ±0.05 
0.78 ± 0.05 


NNPDF2.3 HERA+ATLASWZ 
ATLAS (Ref. [21]) 


1.2 ±0.9 

1 on 


1.04 ±0.23 

'^■^^ -0.10 



PDF Set 


Ks{2 GeV^) 




NNPDF2.1 
NNPDF2.3 noLHC 
NNPDF2.3 


'-'■^"-0.08 

30+0 09 

'-'•"^"-'-0.08 
'-'•"J^^-0.08 


n 63+0 U4 
'-'•""J-0.05 

0.68^°:°^ 


NNPDF2.3 HERA±ATLASWZ 


q 1+u.y 


1 q+0.5 
^•■^-0.6 



Table 9: The strangeness fraction Eq. ([T7|) (top table) and strangeness momentum fraction Eq. (IT51) 
(bottom table) determined using the NNPDF2.1, NNPDF2.3 noLHC, NNPDF2.3 and NNPDF2.3 
HERA+ATLASWZ NNLO PDF sets with a^iMz) = 0.119. The values of Ref. [H] are also shown 
(note that for ATLAS the low-scale value is given at — 1.9 GeV^). The PDF uncertainties in 
the strangeness fraction Eq. (jl7l) are one sigma errors while in the strangeness momentum fraction 
Eq. they are 68% confidence levels. 
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6 Phenomenology 

We discuss now some phenomenological implications of the NNPDF2.3 parton set. After 
briefly discussing NNPDF2.3 parton luminosities, we use them to compute several LHC 
reference "standard candle" cross-sections. 

6.1 Parton luminosities 

At a hadron collider, all factorized observables depend on parton distributions through a 
parton luminosity, which, following Ref. [lU7j . we define as 



where /j(rc,M^) is a PDF and r = M\js. The parton luminosity thus contains all the 
information on the dependence of hadronic cross-sections on PDFs. 

Parton luminosities computed for LHC 8 TeV using NNPDF2.1 and NNPDF2.3 PDFs 
at NNLO with as(Mz) = 0.119 are compared in Fig. [29l The NLO luminosities are quite 
similar. All the luminosities are very compatible at the one sigma level. In particular, the 
gluon-gluon luminosity, which is relevant for Higgs production at the LHC, is quite stable 
in the region which corresponds to Standard Model Higgs production. The heavy quark 
PDFs follow the behaviour of the gluon, from which they are generated dynamically via 
perturbative evolution. Note that the masses of the heavy quarks rric and mb are the same 
in the NLO and NNLO analyses. 

In going from NNPDF2.1 to NNPDF2.3, the uncertainty on the gluon-gluon luminosity 
is reduced somewhat for larger final state invariant masses, while the qq luminosity is 
somewhat smaller in the same region. As discussed in Sec. 14.31 the former effect is due both 
to the improved genetic algorithm minimization and the impact of the ATLAS inclusive jet 
data, while the latter is due to the impact of the LHC electroweak vector boson production 
data. 




(19) 
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PDF Uncertainty parton lumis - LHC 8 TeV - Ratio to NNPDF2.3 NNLO PDF Uncertainty parton iumis - LHC 8 TeV - Ratio to NI*JPDF2.3 NNLO 




PDF Uncertainty parton iumis - LHC 8 TeV - Ratio to NNPDF2.3 NNLO PDF Uncertainty parton iumis - LHC 8 TeV - Ratio to NNPDF2.3 NNLO 




PDF Uncertainty parton iumis - LHC 8 TeV - Ratio to NNPDF2.3 NNLO PDF Uncertainty parton iumis - LHC 8 TeV - Ratio to NNPDF2.3 NNLO 




Figure 29: Comparison of the parton luminosities for LHC at 8 TeV, computed using the 
NNPDF2.1 and NNPDF2.3 NNLO PDFs, using N.^p = 100 rephcas from both sets. From left 
to right we show $gg, (top) $qq, $cc, (middle) $55, ^tg (bottom). All luminosities are plotted 
as ratios to the NNPDF2.3 NNLO central value. All uncertainties shown are one sigma. 
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a(W+) 


cr(W-) 


a(Z") 




NLO 


NNLO 


NLO 


NNLO 


NLO 


NNLO 


NNPDF2.1 


5.891 ± 0.133 


6.198 ± 0.098 


4.015 ± 0.091 


4.213 ± 0.068 


0.928 ± 0.018 


0.972 ± 0.013 


NNPDF2.3 
NNPDF2.3 noLHC 
NNPDF2.3 collider 


5.854 ± 0.076 
5.886 ± 0.082 
5.845 ± 0.104 


6.122 ± 0.078 
6.198 ± 0.109 
6.127 ± 0.107 


4.057 ± 0.053 
4.039 ± 0.066 
4.071 ± 0.074 


4.202 ± 0.059 
4.218 ± 0.072 
4.234 ± 0.068 


0.931 ± 0.011 
0.931 ± 0.012 
0.953 ± 0.016 


0.968 ± 0.012 
0.974 ± 0.015 
1.000 ± 0.020 





a(W)/o-(Z") 


cr(W+)/(T(W") 




NLO 


NNLO 


NLO 


NNLO 


NNPDF2.1 


10.678 ± 0.031 


10.707 ± 0.034 


1.467 ± 0.018 


1.471 ± 0.021 


NNPDF2.3 
NNPDF2.3 noLHC 
NNPDF2.3 collider 


10.650 ± 0.022 
10.662 ± 0.025 
10.403 ± 0.131 


10.669 ± 0.034 
10.692 ± 0.026 
10.359 ± 0.124 


1.443 ± 0.009 
1.457 ± 0.022 
1.436 ± 0.014 


1.457 ± 0.013 
1.470 ± 0.020 
1.447 ± 0.015 



Table 10: Total cross-sections for W and Z production at the LHC at -^s = 7 TeV. All uncertainties 
shown are one sigma (in nb). Branching ratios are included in the cross-section. 





o-(tt) 


a(H) 




NLO 


NNLO 


NLO 


NNLO 


NNPDF2.1 


160.1 ± 5.4 


158.6 ± 4.4 


11.40 ± 0.18 


15.22 ± 0.22 


NNPDF2.3 
NNPDF2.3noLHC 
NNPDF2.3-collider 


158.3 ± 4.0 

158.4 ± 4.3 
151.2 ± 6.1 


157.1 ± 4.2 
157.1 ± 4.7 
150.0 ± 5.3 


11.46 ± 0.13 
11.48 ± 0.14 
10.80 ± 0.25 


15.31 ± 0.20 
15.30 ± 0.17 
14.44 ± 0.27 



Table 11: Total cross-sections for top quark pair production and Higgs production in gluon fusion 
at the LHC at ^ = 7 TeV (in pb). All uncertainties shown are one sigma 



6.2 Total cross-sections 

We present now results for several benchmark total cross-sections at the LHC at NLO 
and NNLO using NNPDF2.3 PDFs as weh as NNPDF2T PDFs for comparison, with 
OsiMz) = 0.119 at the LHC, and = 7 TeV and = 8 TeV. We determine the 
following observables: 

• electroweak gauge boson production total cross-sections and W+/W^ and W/Z 
cross-section ratios, using the Vrap code |108j with scale = My-; 

• top pair production total cross-section, using the top++ code |109j with = m^; 
at NNLO the approximate cross-sections of Ref. |110| are used; the settings are the 
default ones of Ref. [23], and are the same at NLO and NNLO; in particular the top 
quark mass is taken to be rrit = 173.3 GeV; use the same settings for the calculations 
with NLO and NNLO PDFs. 

• Standard Model Higgs boson production cross-sections with mn = 125 GeV in the 
gluon fusion using the iHixs code [lllj with Q = mu- 

All uncertainties shown are PDF uncertainties only: in particular they do not include 
the uncertainty due to the variation of as, nor theoretical uncertainties, in particular the 
uncertainty due to missing higher orders, usually estimated by varying renormalization 
and factorization scales. 
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a(W+) 


cr(W-) 


a(Z») 




NLO 


NNLO 


NLO 


NNLO 


NLO 


NNLO 


NNPDF2.1 


6.759 ± 0.152 


7.116 ± 0.112 


4.683 ± 0.103 


4.918 ± 0.078 


1.078 ± 0.021 


1.131 ± 0.015 


NNPDF2.3 
NNPDF2.3 noLHC 
NNPDF2.3 collider 


6.718 ± 0.087 
6.756 ± 0.090 
6.734 ± 0.110 


7.033 ± 0.090 
7.120 ± 0.124 
7.066 ± 0.117 


4.731 ± 0.061 
4.715 ± 0.074 
4.760 ± 0.080 


4.907 ± 0.067 
4.927 ± 0.082 
4.957 ± 0.076 


1.082 ± 0.013 

1.083 ± 0.013 
1.111 ± 0.019 


1.127 ± 0.013 
1.134 ± 0.017 
1.168 ± 0.023 





a(W)/o-(Z") 


cr(W+)/(T(W") 




NLO 


NNLO 


NLO 


NNLO 


NNPDF2.1 


10.611 ± 0.033 


10.639 ± 0.035 


1.443 ± 0.017 


1.447 ± 0.019 


NNPDF2.3 
NNPDF2.3 noLHC 
NNPDF2.3 collider 


10.580 ± 0.023 
10.591 ± 0.026 
10.347 ± 0.128 


10.598 ± 0.035 
10.622 ± 0.028 
10.299 ± 0.124 


1.420 ± 0.008 
1.433 ± 0.019 
1.415 ± 0.014 


1.433 ± 0.012 
1.445 ± 0.018 
1.425 ± 0.013 



Table 12: Same as Tab. M but for = 8 TeV. 





cr (tt) 


a(H) 




NLO 


NNLO 


NLO 


NNLO 


NNPDF2.1 


229.5 ± 6.9 


226.8 ± 5.8 


14.52 ± 0.21 


19.42 ± 0.26 


NNPDF2.3 
NNPDF2.3noLHC 
NNPDF2.3-collider 


227.3 ± 5.1 
227.5 ± 5.6 
216.8 ± 8.1 


225.1 ± 5.5 
225.1 ± 6.0 
214.6 ± 7.0 


14.61 ± 0.16 
14.65 ± 0.17 
13.81 ± 0.30 


19.54 ± 0.25 
19.53 ± 0.21 
18.48 ± 0.32 



Table 13: Same as Tab. HH but for = 8 TeV. 



Results are collected in Tabs.[l0]&[II](Vi = 7 TeV) and in Tabs.[l2]&[T3](Vs = 8 TeV), 
and represented graphically in Fig. [30j For all observables it is clear that even though 
everything is consistent within uncertainties, the accuracy increases when going from 
NNPDF2.1 to NNPDF2.3. Comparison with results obtained using the NNPDF2.3 noLHC 
sets shows that improvement is partly due to the improved methodology, but the LHC 
data have a visible impact both on the central value and uncertainty. Results obtained 
using the collider only set are not yet competitive, even for these very inclusive observ- 
ables, except for quantities such as the W+/W^ ratio which are determined primarily by 
the collider data. 
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Figure 30: Graphical representation of the results of Tabs. [TUHT^ 
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7 Conclusions 



The NNPDF2.3 PDF set is the first global PDF set to include systematically all relevant 
LHC data, and it is thus arguably the most accurate PDF set currently available. LHC 
data are likely to play an increasing role in future refinements of PDF sets. In particular, 
the internal inconsistencies which were noticed long ago in fixed target DIS data \W\ , and 
the tensions between fixed target and collider data, some of which have been discussed here 
in SecO suggest that a reliable PDF determination should avoid low energy data and data 
obtained using nuclear targets, and should thus be based only on lepton-hadron or hadron- 
hadron collider data. Current collider data are not yet sufficient to give a competitive PDF 
determination by themselves (see Sec. [5|). However this situation is likely to evolve very 
rapidly thanks to the excellent performance of the LHC and its experiments, not only 
as the data on the usual inclusive processes become more precise, but also through the 
incorporation of new processes, such as W boson production in association with charm 
quarks for strangeness determination [62 pil2j . Cleaner data are in turn likely to stimulate 
further improvements in the computational and statistical methodologies used for their 
analysis, and in the theoretical framework used to describe them. 

All the NNPDF2.3 PDF sets that have been discussed in this work are available from 
the NNPDF web site, 

http : //nnpdf . hepf orge . org/ 

and through the LHAPDF interface |113| . On the NNPDF web site a Mathematica inter- 
face is also available, as well as a more complete selection of PDF plots. 

Specifically, the new PDF sets that have been produced in the present analysis and 
are available in LHAPDF are the following: 

• NNPDF2.3 NLO and NNLO sets of N^ep = 100 rephcas, provided for all values of 
as from 0.114 to 0.124 varied in steps of 6as = 0.001: 
NNPDF23_nlo_as_0114.LHgrid, . . ., NNPDF23_nlo_as_0124 . LHgrid; 
NNPDF23_nnlo_as_0114.LHgrid, . . ., NNPDF23_tinlo_as_0124 . LHgrid; 

• NNPDF2.3 NLO and NNLO PDF sets based on reduced data sets, provided for all 
values of ag from 0.116 to 0.120 in steps of 0.001: 

NNPDF23_nlo_noLHC_as_01 16. LHgrid, . . ., NNPDF23_tilo_noLHC_as_0120 . LHgrid; 
NNPDF23_nnlo_noLHC_as_01 16. LHgrid, . . ., NNPDF23_nnlo_noLHC_as_0120 . LHgrid; 
MMPDF23_nlo_collider_as_0116. LHgrid, . . ., NNPDF23_nlo_collider_as_0120 . LHgrid 
NNPDF23_nnlo_collider_as_0116. LHgrid, . . ., NNPDF23_iinlo_collider_as_0120 . LHgr 

• NNPDF2.3 NLO and NNLO PDF sets in the n/ = 4 and Uf = 5 schemes (number 
of active fiavors only increases up to the given value), provided for all values of as 
from 0.116 to 0.120 in steps of 0.001: 

NNPDF23_nlo_FFN_NF4_as_01 16. LHgrid, . . ., NNPDF23_iilo_FFN_NF4_as_0120 . LHgrid; 
NNPDF23_nnlo_FFN_NF4_as_01 16. LHgrid, . . ., NNPDF23_nnlo_FFNJJF4_as_0120 . LHgrid; 
NNPDF23_nlo_FFN_NF5_as_01 16. LHgrid, . . ., NNPDF23_iilo_FFN_NF5_as_0120 . LHgrid; 
NNPDF23_nnlo_FFN_NF5_as_01 16. LHgrid, . . ., NNPDF23_nnlo_FFNJJF5_as_0120 . LHgrid; 
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